{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Extending our PyTorch containers\n",
    "\n",
    "With Amazon SageMaker, you can package your own algorithms that can then be trained and deployed in the SageMaker environment. This notebook guides you through an example on how to extend one of our existing and predefined SageMaker deep learning framework containers.\n",
    "\n",
    "By packaging an algorithm in a container, you can bring almost any code to the Amazon SageMaker environment, regardless of programming language, environment, framework, or dependencies. \n",
    "\n",
    "1. [Extending our PyTorch containers](#Extending-our-pytorch-containers)\n",
    "  1. [When should I extend a SageMaker container?](#When-should-I-extend-a-SageMaker-container?)\n",
    "  1. [Permissions](#Permissions)\n",
    "  1. [The example](#The-example)\n",
    "  1. [The presentation](#The-presentation)\n",
    "1. [Part 1: Packaging and Uploading your Algorithm for use with Amazon SageMaker](#Part-1:-Packaging-and-Uploading-your-Algorithm-for-use-with-Amazon-SageMaker)\n",
    "    1. [An overview of Docker](#An-overview-of-Docker)\n",
    "    1. [How Amazon SageMaker runs your Docker container](#How-Amazon-SageMaker-runs-your-Docker-container)\n",
    "      1. [Running your container during training](#Running-your-container-during-training)\n",
    "        1. [The input](#The-input)\n",
    "        1. [The output](#The-output)\n",
    "      1. [Running your container during hosting](#Running-your-container-during-hosting)\n",
    "    1. [The parts of the sample container](#The-parts-of-the-sample-container)\n",
    "    1. [The Dockerfile](#The-Dockerfile)\n",
    "    1. [Building and registering the container](#Building-and-registering-the-container)\n",
    "  1. [Testing your algorithm on your local machine](#Testing-your-algorithm-on-your-local-machine)\n",
    "  1. [Download the CIFAR-10 dataset](#Download-the-CIFAR-10-dataset)\n",
    "  1. [SageMaker Python SDK Local Training](#SageMaker-Python-SDK-Local-Training)\n",
    "  1. [Fit, Deploy, Predict](#Fit,-Deploy,-Predict)\n",
    "  1. [Making predictions using Python SDK](#Making-predictions-using-Python-SDK)\n",
    "1. [Part 2: Training and Hosting your Algorithm in Amazon SageMaker](#Part-2:-Training-and-Hosting-your-Algorithm-in-Amazon-SageMaker)\n",
    "  1. [Set up the environment](#Set-up-the-environment)\n",
    "  1. [Create the session](#Create-the-session)\n",
    "  1. [Upload the data for training](#Upload-the-data-for-training)\n",
    "  1. [Training On SageMaker](#Training-on-SageMaker)\n",
    "  1. [Optional cleanup](#Optional-cleanup)  \n",
    "1. [Reference](#Reference)\n",
    "\n",
    "_or_ I'm impatient, just [let me see the code](#The-Dockerfile)!\n",
    "\n",
    "## When should I extend a SageMaker container?\n",
    "\n",
    "You may not need to create a container to bring your own code to Amazon SageMaker. When you are using a framework such as [TensorFlow](https://github.com/aws/sagemaker-tensorflow-container), [MXNet](https://github.com/aws/sagemaker-mxnet-container), [PyTorch](https://github.com/aws/sagemaker-pytorch-container) or [Chainer](https://github.com/aws/sagemaker-chainer-container) that has direct support in SageMaker, you can simply supply the Python code that implements your algorithm using the SDK entry points for that framework.\n",
    "\n",
    "Even if there is direct SDK support for your environment or framework, you may want to add additional functionality or configure your container environment differently while utilizing our container to use on SageMaker.\n",
    "\n",
    "**Some of the reasons to extend a SageMaker deep learning framework container are:**\n",
    "1. Install additional dependencies. (E.g. I want to install a specific Python library, that the current SageMaker containers don't install.)\n",
    "2. Configure your environment. (E.g. I want to add an environment variable to my container.)\n",
    "\n",
    "**Although it is possible to extend any of our framework containers as a parent image, the example this notebook covers is currently only intended to work with our PyTorch (0.4.0+) and Chainer (4.1.0+) containers.**\n",
    "\n",
    "This walkthrough shows that it is quite straightforward to extend one of our containers to build your own custom container for PyTorch or Chainer.\n",
    "\n",
    "## Permissions\n",
    "\n",
    "Running this notebook requires permissions in addition to the normal `SageMakerFullAccess` permissions. This is because it creates new repositories in Amazon ECR. The easiest way to add these permissions is simply to add the managed policy `AmazonEC2ContainerRegistryFullAccess` to the role that you used to start your notebook instance. There's no need to restart your notebook instance when you do this, the new permissions will be available immediately.\n",
    "\n",
    "## The example\n",
    "\n",
    "In this example we show how to package a PyTorch container, extending the SageMaker PyTorch container, with a Python example which works with the CIFAR-10 dataset. By extending the SageMaker PyTorch container we can utilize the existing training and hosting solution made to work on SageMaker. By comparison, if one were to build their own custom framework container from scratch, they would need to implement a training and hosting solution in order to use SageMaker. Here is an example showing [how to create a SageMaker TensorFlow container from scratch](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/tensorflow_bring_your_own/tensorflow_bring_your_own.ipynb).\n",
    "\n",
    "In this example, we use a single image to support training and hosting. This simplifies the procedure because we only need to manage one image for both tasks. Sometimes you may want separate images for training and hosting because they have different requirements. In this case, separate the parts discussed below into separate Dockerfiles and build two images. Choosing whether to use a single image or two images is a matter of what is most convenient for you to develop and manage.\n",
    "\n",
    "If you're only using Amazon SageMaker for training or hosting, but not both, only the functionality used needs to be built into your container.\n",
    "\n",
    "[CIFAR-10]: http://www.cs.toronto.edu/~kriz/cifar.html\n",
    "\n",
    "## The presentation\n",
    "\n",
    "This presentation is divided into two parts: _building_ the container and _using_ the container."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 1: Packaging and Uploading your Algorithm for use with Amazon SageMaker\n",
    "\n",
    "### An overview of Docker\n",
    "\n",
    "If you're familiar with Docker already, you can skip ahead to the next section.\n",
    "\n",
    "For many data scientists, Docker containers are a new technology. But they are not difficult and can significantly simplify the deployment of your software packages. \n",
    "\n",
    "Docker provides a simple way to package arbitrary code into an _image_ that is totally self-contained. Once you have an image, you can use Docker to run a _container_ based on that image. Running a container is just like running a program on the machine except that the container creates a fully self-contained environment for the program to run. Containers are isolated from each other and from the host environment, so the way your program is set up is the way it runs, no matter where you run it.\n",
    "\n",
    "Docker is more powerful than environment managers like conda or virtualenv because (a) it is completely language independent and (b) it comprises your whole operating environment, including startup commands, and environment variable.\n",
    "\n",
    "A Docker container is like a virtual machine, but it is much lighter weight. For example, a program running in a container can start in less than a second and many containers can run simultaneously on the same physical or virtual machine instance.\n",
    "\n",
    "Docker uses a simple file called a `Dockerfile` to specify how the image is assembled. An example is provided below. You can build your Docker images based on Docker images built by yourself or by others, which can simplify things quite a bit.\n",
    "\n",
    "Docker has become very popular in programming and devops communities due to its flexibility and its well-defined specification of how code can be run in its containers. It is the underpinning of many services built in the past few years, such as [Amazon ECS].\n",
    "\n",
    "Amazon SageMaker uses Docker to allow users to train and deploy arbitrary algorithms.\n",
    "\n",
    "In Amazon SageMaker, Docker containers are invoked in a one way for training and another, slightly different, way for hosting. The following sections outline how to build containers for the SageMaker environment.\n",
    "\n",
    "Some helpful links:\n",
    "\n",
    "* [Docker home page](http://www.docker.com)\n",
    "* [Getting started with Docker](https://docs.docker.com/get-started/)\n",
    "* [Dockerfile reference](https://docs.docker.com/engine/reference/builder/)\n",
    "* [`docker run` reference](https://docs.docker.com/engine/reference/run/)\n",
    "\n",
    "[Amazon ECS]: https://aws.amazon.com/ecs/\n",
    "\n",
    "### How Amazon SageMaker runs your Docker container\n",
    "\n",
    "Because you can run the same image in training or hosting, Amazon SageMaker runs your container with the argument `train` or `serve`. How your container processes this argument depends on the container. All SageMaker deep learning framework containers already cover this requirement and will trigger your defined training algorithm and inference code.\n",
    "\n",
    "* If you specify a program as an `ENTRYPOINT` in the Dockerfile, that program will be run at startup and its first argument will be `train` or `serve`. The program can then look at that argument and decide what to do. The original `ENTRYPOINT` specified within the SageMaker PyTorch is [here](https://github.com/aws/sagemaker-pytorch-container/blob/master/docker/0.4.0/final/Dockerfile.cpu#L18).\n",
    "\n",
    "#### Running your container during training\n",
    "\n",
    "Currently, our SageMaker PyTorch container utilizes [console_scripts](http://python-packaging.readthedocs.io/en/latest/command-line-scripts.html#the-console-scripts-entry-point) to make use of the `train` command issued at training time. The line that gets invoked during `train` is defined within the setup.py file inside [SageMaker Containers](https://github.com/aws/sagemaker-containers/blob/master/setup.py#L48), our common SageMaker deep learning container framework. When this command is run, it will invoke the [trainer class](https://github.com/aws/sagemaker-containers/blob/master/src/sagemaker_containers/cli/train.py) to run, which will finally invoke our [PyTorch container code](https://github.com/aws/sagemaker-pytorch-container/blob/master/src/sagemaker_pytorch_container/training.py) to run your Python file.\n",
    "\n",
    "A number of files are laid out for your use, under the `/opt/ml` directory:\n",
    "\n",
    "    /opt/ml\n",
    "    ├── input\n",
    "    │   ├── config\n",
    "    │   │   ├── hyperparameters.json\n",
    "    │   │   └── resourceConfig.json\n",
    "    │   └── data\n",
    "    │       └── <channel_name>\n",
    "    │           └── <input data>\n",
    "    ├── model\n",
    "    │   └── <model files>\n",
    "    └── output\n",
    "        └── failure\n",
    "\n",
    "##### The input\n",
    "\n",
    "* `/opt/ml/input/config` contains information to control how your program runs. `hyperparameters.json` is a JSON-formatted dictionary of hyperparameter names to values. These values are always strings, so you may need to convert them. `resourceConfig.json` is a JSON-formatted file that describes the network layout used for distributed training.\n",
    "* `/opt/ml/input/data/<channel_name>/` (for File mode) contains the input data for that channel. The channels are created based on the call to CreateTrainingJob but it's generally important that channels match algorithm expectations. The files for each channel are copied from S3 to this directory, preserving the tree structure indicated by the S3 key structure. \n",
    "* `/opt/ml/input/data/<channel_name>_<epoch_number>` (for Pipe mode) is the pipe for a given epoch. Epochs start at zero and go up by one each time you read them. There is no limit to the number of epochs that you can run, but you must close each pipe before reading the next epoch.\n",
    "\n",
    "##### The output\n",
    "\n",
    "* `/opt/ml/model/` is the directory where you write the model that your algorithm generates. Your model can be in any format that you want. It can be a single file or a whole directory tree. SageMaker packages any files in this directory into a compressed tar archive file. This file is made available at the S3 location returned in the `DescribeTrainingJob` result.\n",
    "* `/opt/ml/output` is a directory where the algorithm can write a file `failure` that describes why the job failed. The contents of this file are returned in the `FailureReason` field of the `DescribeTrainingJob` result. For jobs that succeed, there is no reason to write this file as it is ignored.\n",
    "\n",
    "#### Running your container during hosting\n",
    "\n",
    "Hosting has a very different model than training because hosting is reponding to inference requests that come in via HTTP. Currently, the SageMaker PyTorch containers [uses](https://github.com/aws/sagemaker-pytorch-container/blob/master/src/sagemaker_pytorch_container/serving.py#L103) our [recommended Python serving stack](https://github.com/aws/sagemaker-containers/blob/master/src/sagemaker_containers/_server.py#L44) to provide robust and scalable serving of inference requests:\n",
    "\n",
    "![Request serving stack](stack.png)\n",
    "\n",
    "Amazon SageMaker uses two URLs in the container:\n",
    "\n",
    "* `/ping` receives `GET` requests from the infrastructure. Your program returns 200 if the container is up and accepting requests.\n",
    "* `/invocations` is the endpoint that receives client inference `POST` requests. The format of the request and the response is up to the algorithm. If the client supplied `ContentType` and `Accept` headers, these are passed in as well. \n",
    "\n",
    "The container has the model files in the same place that they were written to during training:\n",
    "\n",
    "    /opt/ml\n",
    "    └── model\n",
    "        └── <model files>\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The parts of the sample container\n",
    "\n",
    "The `container` directory has all the components you need to extend the SageMaker PyTorch container to use as an sample algorithm:\n",
    "\n",
    "    .\n",
    "    ├── Dockerfile\n",
    "    ├── build_and_push.sh\n",
    "    └── cifar10\n",
    "        ├── cifar10.py\n",
    "\n",
    "Let's discuss each of these in turn:\n",
    "\n",
    "* __`Dockerfile`__ describes how to build your Docker container image. More details are provided below.\n",
    "* __`build_and_push.sh`__ is a script that uses the Dockerfile to build your container images and then pushes it to ECR. We invoke the commands directly later in this notebook, but you can just copy and run the script for your own algorithms.\n",
    "* __`cifar10`__ is the directory which contains our user code to be invoked.\n",
    "\n",
    "In this simple application, we install only one file in the container. You may only need that many, but if you have many supporting routines, you may wish to install more.\n",
    "\n",
    "The files that we put in the container are:\n",
    "\n",
    "* __`cifar10.py`__ is the program that implements our training algorithm and handles loading our model for inferences."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The Dockerfile\n",
    "\n",
    "The Dockerfile describes the image that we want to build. You can think of it as describing the complete operating system installation of the system that you want to run. A Docker container running is quite a bit lighter than a full operating system, however, because it takes advantage of Linux on the host machine for the basic operations. \n",
    "\n",
    "We start from the SageMaker PyTorch image as the base. The base image is an ECR image, so it will have the following pattern.\n",
    "* {account}.dkr.ecr.{region}.amazonaws.com/sagemaker-{framework}:{framework_version}-{processor_type}-{python_version}\n",
    "\n",
    "Here is an explanation of each field.\n",
    "1. account - AWS account ID the ECR image belongs to. Our public deep learning framework images are all under the 520713654638 account.\n",
    "2. region - The region the ECR image belongs to. [Available regions](https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/).\n",
    "3. framework - The deep learning framework.\n",
    "4. framework_version - The version of the deep learning framework.\n",
    "5. processor_type - CPU or GPU.\n",
    "6. python_version - The supported version of Python.\n",
    "\n",
    "So the SageMaker PyTorch ECR image would be:\n",
    "520713654638.dkr.ecr.us-west-2.amazonaws.com/sagemaker-pytorch:0.4.0-cpu-py3\n",
    "\n",
    "Information on supported frameworks and versions can be found in this [README](https://github.com/aws/sagemaker-python-sdk).\n",
    "\n",
    "Next, we add the code that implements our specific algorithm to the container and set up the right environment for it to run under.\n",
    "\n",
    "**DISCLAIMER: As of now, the support for the two environment variables below are only supported for the SageMaker Chainer (4.1.0+) and PyTorch (0.4.0+) containers.**\n",
    "\n",
    "Finally, we need to specify two environment variables.\n",
    "1. SAGEMAKER_SUBMIT_DIRECTORY - the directory within the container containing our Python script for training and inference.\n",
    "2. SAGEMAKER_PROGRAM - the Python script that should be invoked for training and inference.\n",
    "\n",
    "Let's look at the Dockerfile for this example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "# Copyright 2017-2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.\r\n",
      "#\r\n",
      "# Licensed under the Apache License, Version 2.0 (the \"License\"). You\r\n",
      "# may not use this file except in compliance with the License. A copy of\r\n",
      "# the License is located at\r\n",
      "#\r\n",
      "#     http://aws.amazon.com/apache2.0/\r\n",
      "#\r\n",
      "# or in the \"license\" file accompanying this file. This file is\r\n",
      "# distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF\r\n",
      "# ANY KIND, either express or implied. See the License for the specific\r\n",
      "# language governing permissions and limitations under the License.\r\n",
      "\r\n",
      "# For more information on creating a Dockerfile\r\n",
      "# https://docs.docker.com/compose/gettingstarted/#step-2-create-a-dockerfile\r\n",
      "# https://github.com/awslabs/amazon-sagemaker-examples/master/advanced_functionality/pytorch_extending_our_containers/pytorch_extending_our_containers.ipynb\r\n",
      "# SageMaker PyTorch image\r\n",
      "FROM 520713654638.dkr.ecr.us-west-2.amazonaws.com/sagemaker-pytorch:0.4.0-cpu-py3\r\n",
      "\r\n",
      "ENV PATH=\"/opt/ml/code:${PATH}\"\r\n",
      "\r\n",
      "# /opt/ml and all subdirectories are utilized by SageMaker, we use the /code subdirectory to store our user code.\r\n",
      "COPY /cifar10 /opt/ml/code\r\n",
      "\r\n",
      "# this environment variable is used by the SageMaker PyTorch container to determine our user code directory.\r\n",
      "ENV SAGEMAKER_SUBMIT_DIRECTORY /opt/ml/code\r\n",
      "\r\n",
      "# this environment variable is used by the SageMaker PyTorch container to determine our program entry point\r\n",
      "# for training and serving.\r\n",
      "# For more information: https://github.com/aws/sagemaker-pytorch-container\r\n",
      "ENV SAGEMAKER_PROGRAM cifar10.py"
     ]
    }
   ],
   "source": [
    "!cat container/Dockerfile"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Building and registering the container\n",
    "\n",
    "The following shell code shows how to build the container image using `docker build` and push the container image to ECR using `docker push`. This code is also available as the shell script `container/build-and-push.sh`, which you can run as `build-and-push.sh pytorch-extending-our-containers-cifar10-example` to build the image `pytorch-extending-our-containers-cifar10-example`. \n",
    "\n",
    "This code looks for an ECR repository in the account you're using and the current default region (if you're using a SageMaker notebook instance, this is the region where the notebook instance was created). If the repository doesn't exist, the script will create it. In addition, since we are using the SageMaker PyTorch image as the base, we will need to retrieve ECR credentials to pull this public image."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sh\n",
    "\n",
    "# The name of our algorithm\n",
    "algorithm_name=pytorch-extending-our-containers-cifar10-example\n",
    "\n",
    "cd container\n",
    "\n",
    "account=$(aws sts get-caller-identity --query Account --output text)\n",
    "\n",
    "# Get the region defined in the current configuration (default to us-west-2 if none defined)\n",
    "region=$(aws configure get region)\n",
    "region=${region:-us-west-2}\n",
    "\n",
    "fullname=\"${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest\"\n",
    "\n",
    "# If the repository doesn't exist in ECR, create it.\n",
    "\n",
    "aws ecr describe-repositories --repository-names \"${algorithm_name}\" > /dev/null 2>&1\n",
    "\n",
    "if [ $? -ne 0 ]\n",
    "then\n",
    "    aws ecr create-repository --repository-name \"${algorithm_name}\" > /dev/null\n",
    "fi\n",
    "\n",
    "# Get the login command from ECR and execute it directly\n",
    "$(aws ecr get-login --region ${region} --no-include-email)\n",
    "\n",
    "# Get the login command from ECR in order to pull down the SageMaker PyTorch image\n",
    "$(aws ecr get-login --registry-ids 520713654638 --region ${region} --no-include-email)\n",
    "\n",
    "# Build the docker image locally with the image name and then push it to ECR\n",
    "# with the full name.\n",
    "\n",
    "docker build  -t ${algorithm_name} . --build-arg REGION=${region}\n",
    "docker tag ${algorithm_name} ${fullname}\n",
    "\n",
    "docker push ${fullname}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Testing your algorithm on your local machine\n",
    "\n",
    "When you're packaging your first algorithm to use with Amazon SageMaker, you probably want to test it yourself to make sure it's working correctly. We use the [SageMaker Python SDK](https://github.com/aws/sagemaker-python-sdk) to test both locally and on SageMaker. For more examples with the SageMaker Python SDK, see [Amazon SageMaker Examples](https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-python-sdk). In order to test our algorithm, we need our dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Download the CIFAR-10 dataset\n",
    "We will be utilizing the CIFAR10 dataset loader provided within PyTorch to download and load our data for training."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from utils.utils_cifar import get_train_data_loader, get_test_data_loader, imshow, classes\n",
    "\n",
    "trainloader = get_train_data_loader('/tmp/pytorch-example/cifar-10-data')\n",
    "testloader = get_test_data_loader('/tmp/pytorch-example/cifar-10-data')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# There should be the original tar file along with the extracted data directory. (cifar-10-python.tar.gz, cifar-10-batches-py)\n",
    "! ls /tmp/pytorch-example/cifar-10-data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## SageMaker Python SDK Local Training\n",
    "To represent our training, we use the Estimator class, which needs to be configured in five steps. \n",
    "1. IAM role - our AWS execution role\n",
    "2. train_instance_count - number of instances to use for training.\n",
    "3. train_instance_type - type of instance to use for training. For training locally, we specify `local` or `local_gpu`.\n",
    "4. image_name - our custom PyTorch Docker image we created.\n",
    "5. hyperparameters - hyperparameters we want to pass.\n",
    "\n",
    "Let's start with setting up our IAM role. We make use of a helper function within the Python SDK. This function throw an exception if run outside of a SageMaker notebook instance, as it gets metadata from the notebook instance. If running outside, you must provide an IAM role with proper access stated above in [Permissions](#Permissions)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker import get_execution_role\n",
    "\n",
    "role = get_execution_role()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Fit, Deploy, Predict\n",
    "\n",
    "Now that the rest of our estimator is configured, we can call `fit()` with the path to our local CIFAR10 dataset prefixed with `file://`. This invokes our PyTorch container with 'train' and passes in our hyperparameters and other metadata as json files in /opt/ml/input/config within the container to our program entry point defined in the Dockerfile.\n",
    "\n",
    "After our training has succeeded, our training algorithm outputs our trained model within the /opt/ml/model directory, which is used to handle predictions.\n",
    "\n",
    "We can then call `deploy()` with an instance_count and instance_type, which is 1 and `local`. This invokes our PyTorch container with 'serve', which setups our container to handle prediction requests as defined [here](https://github.com/aws/sagemaker-pytorch-container/blob/master/src/sagemaker_pytorch_container/serving.py#L103). What is returned is a predictor, which is used to make inferences against our trained model.\n",
    "\n",
    "After our prediction, we can delete our endpoint.\n",
    "\n",
    "We recommend testing and training your training algorithm locally first, as it provides quicker iterations and better debuggability."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Lets set up our SageMaker notebook instance for local mode.\n",
    "!/bin/bash ./utils/setup.sh"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import subprocess\n",
    "\n",
    "instance_type = 'local'\n",
    "\n",
    "if subprocess.call('nvidia-smi') == 0:\n",
    "    ## Set type to GPU if one is present\n",
    "    instance_type = 'local_gpu'\n",
    "    \n",
    "print(\"Instance type = \" + instance_type)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.estimator import Estimator\n",
    "\n",
    "hyperparameters = {'epochs': 1}\n",
    "\n",
    "estimator = Estimator(role=role,\n",
    "                      train_instance_count=1,\n",
    "                      train_instance_type=instance_type,\n",
    "                      image_name='pytorch-extending-our-containers-cifar10-example:latest',\n",
    "                      hyperparameters=hyperparameters)\n",
    "\n",
    "estimator.fit('file:///tmp/pytorch-example/cifar-10-data')\n",
    "\n",
    "predictor = estimator.deploy(1, instance_type)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Making predictions using Python SDK\n",
    "\n",
    "To make predictions, we will use a few images, from the test loader, converted into a json format to send as an inference request.\n",
    "\n",
    "The reponse will be tensors containing the probabilities of each image belonging to one of the 10 classes. Based on the highest probability we will map that index to the corresponding class in our output. The classes can be referenced from the [CIFAR-10 website](https://www.cs.toronto.edu/~kriz/cifar.html). Since we didn't train the model for that long, we aren't expecting very accurate results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torchvision, torch\n",
    "import numpy as np\n",
    "\n",
    "from sagemaker.predictor import json_serializer, json_deserializer\n",
    "\n",
    "# get some test images\n",
    "dataiter = iter(testloader)\n",
    "images, labels = dataiter.next()\n",
    "\n",
    "# print images\n",
    "imshow(torchvision.utils.make_grid(images))\n",
    "print('GroundTruth: ', ' '.join('%4s' % classes[labels[j]] for j in range(4)))\n",
    "\n",
    "predictor.accept = 'application/json'\n",
    "predictor.content_type = 'application/json'\n",
    "\n",
    "predictor.serializer = json_serializer\n",
    "predictor.deserializer = json_deserializer\n",
    "\n",
    "outputs = predictor.predict(images.numpy())\n",
    "\n",
    "_, predicted = torch.max(torch.from_numpy(np.array(outputs)), 1)\n",
    "\n",
    "print('Predicted: ', ' '.join('%4s' % classes[predicted[j]]\n",
    "                              for j in range(4)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor.delete_endpoint()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 2: Training and Hosting your Algorithm in Amazon SageMaker\n",
    "Once you have your container packaged, you can use it to train and serve models. Let's do that with the algorithm we made above.\n",
    "\n",
    "## Set up the environment\n",
    "Here we specify the bucket to use and the role that is used for working with SageMaker."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# S3 prefix\n",
    "prefix = 'DEMO-pytorch-cifar10'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create the session\n",
    "\n",
    "The session remembers our connection parameters to SageMaker. We use it to perform all of our SageMaker operations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sagemaker as sage\n",
    "\n",
    "sess = sage.Session()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Upload the data for training\n",
    "\n",
    "We will use the tools provided by the SageMaker Python SDK to upload the data to a default bucket."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "WORK_DIRECTORY = '/tmp/pytorch-example/cifar-10-data'\n",
    "\n",
    "data_location = sess.upload_data(WORK_DIRECTORY, key_prefix=prefix)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Training on SageMaker\n",
    "Training a model on SageMaker with the Python SDK is done in a way that is similar to the way we trained it locally. This is done by changing our train_instance_type from `local` to one of our [supported EC2 instance types](https://aws.amazon.com/sagemaker/pricing/instance-types/).\n",
    "\n",
    "In addition, we must now specify the ECR image URL, which we just pushed above.\n",
    "\n",
    "Finally, our local training dataset has to be in Amazon S3 and the S3 URL to our dataset is passed into the `fit()` call.\n",
    "\n",
    "Let's first fetch our ECR image url that corresponds to the image we just built and pushed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import boto3\n",
    "\n",
    "client = boto3.client('sts')\n",
    "account = client.get_caller_identity()['Account']\n",
    "\n",
    "my_session = boto3.session.Session()\n",
    "region = my_session.region_name\n",
    "\n",
    "algorithm_name = 'pytorch-extending-our-containers-cifar10-example'\n",
    "\n",
    "ecr_image = '{}.dkr.ecr.{}.amazonaws.com/{}:latest'.format(account, region, algorithm_name)\n",
    "\n",
    "print(ecr_image)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sagemaker.estimator import Estimator\n",
    "\n",
    "hyperparameters = {'epochs': 1}\n",
    "\n",
    "instance_type = 'ml.m4.xlarge'\n",
    "\n",
    "estimator = Estimator(role=role,\n",
    "                      train_instance_count=1,\n",
    "                      train_instance_type=instance_type,\n",
    "                      image_name=ecr_image,\n",
    "                      hyperparameters=hyperparameters)\n",
    "\n",
    "estimator.fit(data_location)\n",
    "\n",
    "predictor = estimator.deploy(1, instance_type)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# get some test images\n",
    "dataiter = iter(testloader)\n",
    "images, labels = dataiter.next()\n",
    "\n",
    "# print images\n",
    "imshow(torchvision.utils.make_grid(images))\n",
    "print('GroundTruth: ', ' '.join('%4s' % classes[labels[j]] for j in range(4)))\n",
    "\n",
    "predictor.accept = 'application/json'\n",
    "predictor.content_type = 'application/json'\n",
    "\n",
    "predictor.serializer = json_serializer\n",
    "predictor.deserializer = json_deserializer\n",
    "\n",
    "outputs = predictor.predict(images.numpy())\n",
    "\n",
    "_, predicted = torch.max(torch.from_numpy(np.array(outputs)), 1)\n",
    "\n",
    "print('Predicted: ', ' '.join('%4s' % classes[predicted[j]]\n",
    "                              for j in range(4)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Optional cleanup\n",
    "When you're done with the endpoint, you should clean it up.\n",
    "\n",
    "All of the training jobs, models and endpoints we created can be viewed through the SageMaker console of your AWS account."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "predictor.delete_endpoint()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Reference\n",
    "- [How Amazon SageMaker interacts with your Docker container for training](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html)\n",
    "- [How Amazon SageMaker interacts with your Docker container for inference](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html)\n",
    "- [CIFAR-10 Dataset](https://www.cs.toronto.edu/~kriz/cifar.html)\n",
    "- [SageMaker Python SDK](https://github.com/aws/sagemaker-python-sdk)\n",
    "- [Dockerfile](https://docs.docker.com/engine/reference/builder/)\n",
    "- [scikit-bring-your-own](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/scikit_bring_your_own/scikit_bring_your_own.ipynb)\n",
    "- [SageMaker PyTorch container](https://github.com/aws/sagemaker-pytorch-container)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_pytorch_p36",
   "language": "python",
   "name": "conda_pytorch_p36"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
